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Foreword 


Currently  available  commercial  data  bases  are  screened  according  to  their  ability 
Co  meet  DMA’s  future  geographic  names  processing  requirements.  Vendor  comments, 
current  applications,  and  level  ol  user  satisfaction  are  reported.  These  criteria  will 
be  applied  when  selecting  a  data  base  management  system  to  support  DMA’s  geographic 
names  data  base 


R.  R  Onorati,  Captain,  USN 
Commanding  Officer,  NORDA 


Executive  summary 
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^The  Defense  Mapping  Agency  (BMA)j  with  the  help  of  the  Naval  Ocean  Research 
and  Development  Activity  (NOREbA).-Codfc.  35lfis  conducting  a  large  effort  to 
automate  production  of  maps,  gazetteers,  charts,  and  similar  products.  One  element 
of  this  effort  is  the  creation  of  a  single,  controlled  source  for  the  placenanKS  used 
in  these  products:  the  Geonames  Data  Base  Because  of  its  size  (up  to  JO*?  bytes) 
and  user  requirements  (up  to  25  simultaneous  users),  the  Geonames  Data  Bose  belongs 
to  a  special  subset  of  current  data  base  management  practice  the  \fery  Large  Data  Base 
From  requirements  generated  earlier  by  DMA,  NORDA,  and  others,  this  task 
surveys  commercial  data  base  management  systems  (DBMS)  to  determine  candidates 
for  the  Geonames  DBMS.  These  systems  are  then  ordered  by  objective  and  subjec¬ 
tive  measures  of  performance  to  help  in  selecting  the  Geonames  DBMS.  Among 
the  systems  which  may  satisfy  the  requirements  of  the  Geonames  Data  Base,  two 
are  used  widely  today  and  are  exceptionally  well  regarded  by  current  users.  These 
systems  are  ^ 

_  f^DMS/R--)  Cullinet 

S ^•^ADABAS,  Software  AG  of  North  America 
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Commercial  data  base  management  systems 


1.  Introduction 

Background 

The  Defense  Mapping  Agency  (DMA)  is  developing 
automated  tools  to  aid  production  of  maps,  charts,  gazetteers, 
and  similar  products.  One  key  element  to  successfully  develop¬ 
ing  these  tools  is  having  a  single,  controlled  source  of  digital 
placenames:  the  Geonames  Data  Base 

Currently  there  is  no  analog  to  the  Geonames  Data  Base 
Gazetteers  are  primarily  collected  from  the  Foreign  Place  Names 
File  (FPNF)  containing  about  4.5  million  names  and  associated 
data  stored  on  index  cards.  These  names  represent  features  ap¬ 
pearing  on  1:230,000  scale  maps.  The  names  for  new  maps  are 
taken  primarily  from  old  maps,  at  scales  as  large  as  1:100,000 
or  1:30,000,  accounting  for  features  at  far  greater  detail  and, 
thus,  many  more  names  The  total  number  of  names  from  these 
and  other  sources  is  expected  to  be  about  60  million. 

As  envisioned,  the  Geonames  Data  Base  will  collect  these 
names  into  a  single  digital  resource  of  up  to  133  data  bases, 
depending  on  the  implementation  scheme  adopted.  All  car¬ 
tographic  and  toponymic  applications  will  use  this  common  ba^e, 
thus  providing  DMA  the  opportunity  to  control  accuracy,  con¬ 
sistency,  and  agreement  among  these  data  products.  The  job 
of  controlling  the  base,  maintaining  data  integrity,  and  other 
functions  to  be  discussed  belongs  to  the  data  base  management 
system  (DBMS),  the  software  that  manipulates  and  provides  ac¬ 
cess  to  the  data  base 

Data  base  management  systems  may  be  either  designed  and 
built  from  scratch  or  acquired  and  adapted  from  commercial 
sources  or  agencies  that  have  developed  similar  systems.  Develop¬ 
ing  a  DBMS  from  scratch  for  Geonames  Data  Base  is  a  high- 
risk  task,  particularly  when  the  designer  must  deliver  high-speed 
transactions  and  a  high  level  of  data  integrity  with  a  very  large 
data  base.  Well-established  DBMS  products  have  long  since  com¬ 
pleted  this  phase  of  development  and  associated  risk. 

For  most  applications,  and  certainly  for  the  Geonames  Data 
Base,  adapting  an  existing  system  is  more  cost  efficient.  The 
marketplace  for  DBMS  has  grown  at  a  rate  of  30%  per  year 
over  the  past  seven  years  (McClellan,  1984).  supplying  a  large 
number  of  products  that  have  been  developed,  tested,  and  refined 
on  other  users’  applications.  This  market  has  been  relatively 
efficient  in  turning  academic  ideas  into  specific  realizations  when 
doing  so  has  meant  satisfying  customer  demand  and  improv¬ 
ing  performance  Many  products  (as  discussed  later)  are  able 
to  satisfy  most  of  the  demands  of  the  Geonames  Data  Base 
and  a  few  handle  all  the  requirements  articulated  so  far. 


The  characteristics  of  the  Geonames  Data  Base  have  been 
described  by  Brown  et  al.  (1983),  Langran  (1984)  and  Langran 
et  al.  (1984).  From  these  descriptions  we  know  the  base  to  be 

*  large 

*  multiuser, 

*  dominated  by  production  functions. 

We  also  know  that  building  the  data  base  is  the  largest  technical 
and  practical  problem  facing  the  Geonames  Data  Base  Very 
few  of  the  projected  60  million  placenames  are  now  in  digital 
form.  Data  capture  techniques  are  being  developed  in  parallel 
with  this  project,  which  will  increase  the  rate  of  data  entry— 
currently  to  be  accomplished  by  hand.  Still,  Langran  (1984) 
estimates  10  years  to  bring  the  base  to  a  full  60  million  records. 
This  slow  rate  of  entry  of  new  data  implies 

*  data  independence  is  critical-.  Everything— hardware,  soft¬ 
ware;  etc— may  change  over  the  next  10  years,  but  the  data 
base  must  not  be  allowed  to  become  obsolete. 

*  data  integrity  is  crucial:  Users  must  always  trust  the  base 

*  backup  is  imperative:  The  base  must  never  be  lost. 

*  the  INSERT  function  of  the  DBMS  is  initially  impor 
tant:  A  consideration  for  benchmark. 

The  high  value  of  the  data  base  is  a  common  dement  to  almost 
all  DB  applications  and  large  commercial  systems  address  these 
issues. 

Purpose 

The  purpose  of  this  task  is  to  determine  which,  if  any,  com¬ 
mercial  DBMS  products  satisfy  the  stated  requirements  for  the 
Geonames  Data  Base. 

Approach 

The  task  can  be  considered  to  be  composed  of  five  subtasks: 

*  Determine  if  commercial  DBMS  practice  will  support 
Geonames  Data  Base  applications. 

*  Establish  a  list  of  candidate  DBMS  for  the  Geonames  Data 
Base  from  the  first  subtask. 

*  Rank  candidate  systems  accordingly  if  objective  performance 
measures  are  available 

*  Incorporate  subjective  performance  measures  where 
available 

*  Recommend  a  course  of  action. 

The  methods  employed  in  accomplishing  the  subtasks  are 
described  in  Section  3  of  the  report.  Briefly,  these  methods  in¬ 
cluded  literature  search,  vendor  interviews,  vendor  promotional 
literature  published  user  surveys,  and  Planning  Systems,  Inc. 
(PS!)  user  surveys 


Organization  of  this  report 

This  report  is  organized  to  follow  the  subtasks  described  in 
the  preceding  paragraph.  First,  we  establish  that  the  Geonames 
DBMS  as  presently  described  is  within  current  commercial 
DBMS  practice,  although  because  of  size  and  access  requirements 
belongs  to  a  special  subset  of  the  market.  To  do  this  we  describe 
how  the  views  of  users,  administrators,  and  designers  impose 
constraints  on  any  DBMS  that  is  used  with  very  large  data  bases, 
not  just  the  Geonames  application.  The  commercial  data  base 
market  is  then  briefly  introduced  and  shown  to  meet  similar 
concerns. 

In  Section  3  the  list  of  all  DBMS  available  on  the  market 
is  trimmed  by  two  levels  of  discrimination,  leaving  a  list  of 
systems  that  appears  to  support  the  application.  In  Section  4 
we  attempt  to  order  the  remaining  systems  by  objective  and 
quantitative  measures  of  performance,  but  find  that  no  such 
measures  exist  independently  of  a  specific  data  base  design  and 
implementation.  We  are  thus  limited  in  our  attempt  to  make 
a  quantitative  evaluation.  We  that  rely  on  subjective  assessments, 
such  as  published  user  surveys,  and  our  subjective  reading  of 
the  literature  to  rank  candidate  systems  and  make  our 
recommendation. 

2.  Commercial  DBMS 
support  for  the  Geonames 
Data  Base  application 

The  Geonames  Data  Base 

The  Geonames  Data  Base  belongs  to  a  special  subset  of  data 
base  practice  because  of  its  projected  very  large  size  and  multiuser 
environment.  These  two  factors  impose  most  of  the  meaningful 
constraints  that  must  be  used  when  selecting  a  system.  The 
eventual  size  of  the  data  base  has  been  projected  to  be 

*  1010  bytes, 

*  GO  x  lO6  records, 

*  153  different  bases  corresponding  to  different  geographic 
areas  or  gazetteers 

The  size  of  this  data  base  suggests  many  potential  problems  as 
discussed  by  Brown  et  al.  (1983).  Among  these  problems  are 

*  difficulties  in  building  a  large  base. 

*  difficulties  in  maintaining  a  large  base. 

*  controlling  potentially  long  search  times. 

*  access  and  maintenance  problems  caused  by  scattering  the 
base  over  many  disk  storage  devices. 

These  and  many  similar  unidentified  issues  must  be  raised  and 
specifically  addressed  in  the  data  base  design  process  before  a 
practical  solution  can  be  implemented  in  a  DBMS.  The  DBMS 
is  not  the  solution  to  these  problems;  rather,  it  is  the  tool  through 
which  the  solution  obtained  in  the  design  process  is  implemented. 
Without  that  solution  the  took  can  only  be  judged  either 


abstractly  or  with  respect  to  their  performance  in  other  similar 
applications.  The  DBMS  can  be  separated  into  two  sets:  those 
able  to  efficiently  operate  on  data  volume  as  large  as  the 
Geonames  Data  Base  in  a  multiuser,  multiple  storage  unit  en¬ 
vironment,  and  those  unable  to  do  so  Answers  to  more  specific 
questions  must  await  design  and  benchmark. 

In  addition  to  the  size  of  the  base,  up  to  25  users  must  be 
able  to  simultaneously  access  the  system  without  undue  delay. 
This  suggests  many  more  problems  (Brown  et  al.,  1983)  including 

•  data  base  integrity  (concurrency  controk), 

•  security  (in  the  weak  sense  of  individual  user  files  and  system 
function), 

•  conflicts  in  the  design  process  between  ease  of  use,  flex¬ 
ibility,  and  speed. 

The  system  must  also  support  such  I/O  devices  as  printers,  plot¬ 
ters,  work  stations,  and  displays.  The  level  of  support  provided 
by  the  target  operating  system  (in  terms  of  user  scheduling, 
LOGON /LOGOFF,  and  access  to  the  CPU  and  storage  devices) 
will  affect  the  degree  of  administrative  processing  overhead  the 
DBMS  will  absorh  In  addition,  the  system  must  provide  utilities 
for  DB  maintenance,  security,  access,  reporting,  etc,  for  all  users 
Again,  without  a  specific  design  it  k  difficult  to  quantitatively 
assess  the  efficiency  of  commercial  DBMS  products  Instead, 
we  classify  systems  as  either  capable  of  such  tasks  at  such  a 
scale  or  otherwise  not  qualified. 

Other  constraints  on  the  Geonames  DBMS  derive  from  the 
collective  expectations  of  the  people  involved  with  the  Geonames 
Data  Base  rather  than  from  the  necessary  form  of  the  base 
These  people  include 

•  users— toponomysts,  cartographers; 

•  designers: 

•  data  base  administrator. 

Their  views  have  a  practical  impact  on  DBMS  selection  that 
cannot  be  ignored  and  must  be  accounted  for  in  the  DB  design 
process.  Some  of  these  concerns  have  been  suggested  by  Brown 
et  al.  (1983)  and  by  Langran  et  al.  (1984). 

The  user's  view  of  the 
Geonames  Data  Base 

The  hierarchy  of  users  of  the  Geonames  Data  Base  k  described 
by  Langran  et  al.  (1984).  We  are  particularly  interested  in  system 
performance  as  viewed  by  the  Applications  Analysts,  ie,  the 
cartographers  and  toponomysts.  Brown  et  al.  (1983)  described 
five  classes  of  users  and  15  example  queries  These  users  indude 

•  toponymic  queries— concerned  with  lexical  attributes  of  a 
geoname  or  sets  of  geonames. 

•  gazetteer  production— concerned  with  all  geonames  in  a 
country  whose  feature  attributes  meet  certain  well-defined 
criteria  (eg.,  population  size). 

•  map  production— concerned  with  name,  location,  and  an¬ 
tillary  information  needed  to  determine  type  size,  format, 
etc.,  of  geonames. 


•  outside  queries— sundry  requests  by  non-DMA  personnel. 
Queries  may  be  any  sort  regarding  names. 

•  data  base  builders— massive  updates  to  the  data  base  cor¬ 
responding  to  a  newly  digitized  map,  etc. 

The  1)  queries  collectively  exhibit  a  broad  range  of  expecta¬ 
tions  from  the  data  base  Large  extractions  supporting  produc¬ 
tion  of  maps  or  gazetteers  suggest  standard  applications  packages 
written  to  navigate  the  base  quickly  over  predetermined  paths. 
Other  queries  such  as  “Extract  all  items  in  a  2°  x  2°  box,  ex¬ 
cept  for  waterways”  or  “Extract  all  geonames  of  a  certain  form 
which  came  from  a  given  reference  source”  seem  to  demand 
a  DBMS  sufficiently  flexible  as  to  allow  the  user  to  interact 
with  the  Geonames  Data  Base  directly  without  formal  help  from 
an  applications  programmer. 

Whether  these  classes  of  users  and  sample  queries  accurate¬ 
ly  address  the  real  or  perceived  needs  of  the  applications  analysts 
should  be  determined  in  detail  as  an  early  part  of  the  Geonames 
Data  Base  design  process.  The  Comprehensive  Coordination  Plan 
(Brown  et  al.,  1983)  emphasizes  the  many  assumptions  made 
to  achieve  estimates  of  performance  for  planning.  While  these 
assumptions  were  appropriate  for  preliminary  analysis,  they  may 
or  may  not  accurately  reflect  the  expectations  of  the  Applica¬ 
tions  Analysts.  Closing  this  loop  between  ultimate  user  and  DB 
designer  has  proven  to  be  absolutely  vital  to  implementing  and 
operating  a  successful  DBMS.  Although  we  do  not  expea  our 
conclusions  to  change  as  a  result  of  this  process,  the  DBMS 
should  be  reviewed  with  respect  to  the  new  list  of  transactions 
to  be  performed. 

Constraints  imposed  by  the  user 

Collectively,  the  user  base  seems  to  demand 

•  ease  of  use, 

•  fast  response 

Unfortunately  these  attributes  are  normally  mutually  exclusive 
and  in  a  large  complex  environment  such  as  Geonames  Data 
Base  tradeoffs  between  them  are  hard  to  establish.  Instead, 
strategies  must  be  developed  that  tend  to  support  both  objec¬ 
tives.  Relational  data  struaures  and  powerful  query  languages 
are  generally  conceded  to  make  user  interaaion  easier,  though 
slower.  Navigation  aids  such  as  inverted  pointer  tables,  hashing, 
or  binary  trees  can  be  developed  to  enhance  speed  for  predeter¬ 
mined,  production-oriented  queries  regardless  of  the  data  model 
chosen. 

The  DBMS  should,  therefore,  be  flexible  enough  to  recon¬ 
cile  such  hybrid  approaches  to  improved  performance— and  be 
tolerant  of  different  views  of  the  DB  among  the  users. 

The  DB  Administrator’s  view  of  the 
Geonames  Data  Base 

The  Data  Base  Administrator  (DBA)  is  at  the  top  of  the 
Geonames  Data  Base  personnel  hierarchy,  and  his  concerns 


necessarily  include  those  of  all  the  functions  under  his  authori¬ 
ty.  He  is,  however,  exclusively  concerned  with  the  administra¬ 
tion  of  the  DB,  including 

*  building  and  maintaining  the  Geonames  Data  Base, 

*  controlling  user  interaction  with  the  Geonames  Data  Base. 
These  responsibilities  tend  to  give  the  Administrator  a  different 
view  of  the  data  than  that  taken  by  the  users  of  the  system. 
The  administrator  is  more  likely  to  view  the  data  base  as  a  capital 
investment  (which  it  is)  rather  than  as  just  another  mapmak¬ 
ing  tool  for  the  cartographer  (which  it  also  is).  The  base’s  value 
depends  on  its  accuracy  and  completeness  as  a  whole,  and  a 
substantial  effort  to  assure  those  qualities  is  justified. 

Constraints  imposed  by  the 
DB  Administrator 

The  DBA  expects  the  DBMS  to  provide  the  tools  which  assure 
the  integrity,  security,  audit  trials,  back-up  utilities,  etc.,  that 
provide  him  with  the  means  to  measure  system  performance, 
trace  transaction  histories,  and  recover  from  minor  disasters. 
Major  disasters  must  never  happen. 

Langran  et  al.  (1984)  discuss  the  constraints  imposed  on  the 
DBMS  by  the  DBA: 

Data  Integrity— A  data  base  is  useless  if  the  accuracy  or  validi¬ 
ty  of  the  stored  data  is  questionable  The  DBMS  should  allow 
the  definition  of  data  elements  both  by  type  (eg.,  integer,  character 
string)  and  by  range  limits  of  acceptable  values,  and  have  facilities 
to  allow  applications  programmers  to  impose  additional  valida¬ 
tion  checks  as  required.  Hardware  or  data  transmission  errors 
should  be  detected  and  flagged,  and  failures  during  a  process¬ 
ing  sequence  should  be  rolled  back  to  the  last  correctly  proc¬ 
essed  record  with  appropriate  error  message  diagnostics.  The 
data  relationships  (eg.,  parent/child  one-to-many)  established  by 
the  DBA  should  be  prevented  from  unauthorized  modifications, 
and  mandatory  relationships  should  be  enforced. 

Physical  Data  Protection  —The  volume  of  data  in  the 
Geonames  Data  Base  will  make  frequent  back-up  copying  in¬ 
feasible  The  DBMS  should  provide  update  facilities  that  do 
not  require  manual  re-entry  of  changes  for  both  “father”  and 
"grandfather”  back-up  copies  in  the  event  of  media  or  storage 
device  catastrophe  Accurate  records  of  physical  media  storage 
contents  should  be  maintained. 

Data  Security— Facilities  to  prevent  unauthorized  users  from 
modifying  data  or  reading  sensitive  data  should  be  supported. 
Access  restrictions  to  the  level  of  data  item  (not  just  to  files 
or  records)  is  required. 

Data  Independence— Io  preserve  flexibility  for  future 
enhancements,  the  DBMS  should  provide  data  independence 
between  applications  software  and  the  physical  structure  of  the 
data  base  It  should  be  possible  to  modify  the  physical  structure 
without  affecting  either  the  logical  struaute  (the  user’s  view 
of  data  base)  or  previously  written  data  access  programs. 
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Management  Information  Statistics— Information  on  cur¬ 
rent  system  functions,  changes  in  data  volume,  and  measures 
of  DBMS  efficiency  over  time  is  required  by  the  DBA  to  monitor 
system  performance  and  to  aid  in  planning  system  resource  alloca¬ 
tions.  Identification  of  on-line  users,  current  processing  func¬ 
tions,  and  system  hardware  allocations  should  be  accessible  in¬ 
teractively.  Information  on  volumes  (number  of  elements),  ac¬ 
cess  frequencies,  and  measures  of  update  volatility  of  various 
data  base  components  would  be  made  available  to  system 
operators. 

Tracking  trends  in  data  base  growth  and  retrieval  efficiency 
will  allow  the  DBA  to  trade  off  storage  allocations,  access 
algorithms,  and  user  demands  in  tuning  the  DBMS  for  optimal 
performance  Advance  warning  of  processing  bottlenecks  will 
facilitate  system  adaptability  to  change 

Audit  trail  functions,  including  records  to  identify  the  user 
who  inserted  or  modified  a  data  element,  when  this  occurred, 
and  re(?r  -nces  to  special  circumstances  of  the  event  are  required 
to  support  some  types  of  toponymic  inquiries.  This  informa¬ 
tion  is  not  required  to  be  stored  on-line  or  available  for  general 
access,  and  occasional  requests  of  this  type  are  expected  to  be 
processed  in  batch  mode  System  software  to  automatically  cap¬ 
ture  most  of  this  audit  information  during  data  base  loading 
is  desirable 

The  designer's  view  of  the 
Geonames  Data  Base 

The  architecture  of  the  Geonames  Data  Base  is  the  key  and 
missing  element  that  connects  the  various  requirements  to  the 
data  and  involves  the  practical  problems  posed  by  the  need  to 
devise  logical  data  structures  and  translate  them  to  physical  struc¬ 
tures,  to  devise  access  strategies,  to  assure  data  independence, 
etc.  How  well  the  design  is  accomplished  probably  has  more 
impact  on  the  ultimate  system  performance  than  the  machine, 
the  operating  system,  or  the  DBMS. 

Methodologies  for  data  base  design  abound,  and  a  thorough 
review  of  that  technology  and  its  application  to  design  of  the 
Geonames  Data  Base  is  beyond  the  scope  of  this  effort.  Whether 
the  design  data  model  should  be  network  or  relational  is  a  mat¬ 
ter  for  the  designer  to  determine  after  more  detailed  specifica¬ 
tions  of  the  requirements  are  developed  and  the  computational 
environment  decided.  Only  then  can  intelligent  design  decisions 
be  made 

Design  constraints 

In  the  absence  of  specific  design  requirements,  a  designer  will 
choose  the  DBMS  that  has  the  most  tools  and  places  the  fewest 
restrictions  on  the  DB  implementation.  The  designer  will  be 
concerned  about 

•  performance. 

•  storage  efficiency, 


•  retrieval  efficiency, 

•  data  independence,  etc. 

The  DBMS  should  provide  both  batch  and  interactive  access 
to  the  base,  navigation  aids  to  increase  speed,  easy  access  to 
records  distributed  across  multiple  storage  units,  facilities  to  pro¬ 
vide  data  migration  and  data  clustering,  etc.  The  DBMS  should 
have  an  approach  flexible  enough  to  allow  die  designer  whatever 
logical  data  model  he  chooses,  or  even  to  mix  models  within 
the  base  These  requirements  are  within  current  commercial 
practice 

The  three  data  models  most  commonly  used  by  DB  designers 
are 

•  relational, 

•  network, 

•  hierarchical. 

A  DB  using  the  relational  model  is  made  up  of  flat  files  and 
a  management  system  that  recombines  the  data  elements  to  form 
different  files  (Martin,  1977).  This  type  of  DBMS  has  the  follow¬ 
ing  strengths: 

•  user  determines  the  view, 

•  ease  of  use, 

•  mathematically  elegant, 

•  data  model  is  simple 

Because  only  data  is  stored  in  this  type  of  DB,  the  users  can 
select  how  they  wish  to  organize  or  view  the  data.  Since  the 
data  records  do  not  contain  pointers  that  “point”  to  other 
records,  the  user  does  not  have  to  know  how  to  maneuver 
through  the  data  base  The  relational  data  model  uses  very  few 
data  structures,  composition  rules  and  attributes,  making  it  an 
elegant  model  fas  defined  by  McGee,  1976).  The  data  is  stored 
in  only  one  type  of  reconi,  making  the  data  more  easily 
understood. 

Although  a  relational  DBMS  has  the  above  advantages,  it 
also  has  the  following  disadvantages: 

•  slow, 

•  larger  storage  requirements. 

A  relational  DBMS  is  able  to  create  additional  files  by  find¬ 
ing  the  common  data  element  in  more  than  one  file  and  com¬ 
bining  the  data  from  different  files.  This  implies  that  the  same 
data  element  is  stored  more  than  once,  using  additional  storage 
Although  there  is  no  theoretical  reason  why  a  relational  DBMS 
should  be  significantly  slower  than  systems  using  other  data 
models,  it  is  widely  reported  that  this  is  the  case  (Martin,  1977; 
McGee  1976;  Larson,  1983). 

A  DB  using  the  network  data  model  has  “child”  records 
in  a  structure  with  more  than  one  “parent”  record.  The  term 
"child”  means  that  the  data  stored  in  record  A  is  referenced 
or  pointed  to  from  the  data  stored  in  record  B,  the  parent.  In 
the  case  of  the  network  model,  one  child  record  may  have  many 
parent  records  This  type  of  data  model  has  the  following 
advantages: 

•  speed. 

•  well-documented. 

•  widely  used. 


and  has  the  following  disadvantages: 

*  complex, 

*  difficult  to  implement, 

*  difficult  to  use, 

*  difficult  to  update 

Because  of  the  use  of  pointers  in  the  network  model  the 
DBMS  usually  provides  a  high  performance  This  type  of  data 
model  has  been  used  for  many  years  to  achieve  the  performance 
needed  for  very  large  data  bases;  as  a  result,  it  is  used  widely 
and  is  well-documented.  This  approach  is  appropriate  for  systems 
with  clearly  defined  data  relationships  and  query  requirements, 
and  where  changes  to  record  structures  are  not  anticipated. 

The  child /parent  records  in  a  network  DB  makes  the  base 
difficult  to  understand.  If  the  data  base  is  large,  as  in  the  case 
of  the  Geonames  Data  Base,  then  there  may  be  many  child^arent 
relationships  that  need  to  be  tracked.  This  complexity  makes 
the  base  both  difficult  to  implement  and  difficult  to  use  Also, 
if  the  data  base  is  moved  from  one  storage  device  to  another 
the  pointers  stored  in  the  DB  must  be  changed,  making  up¬ 
dates  difficult. 

The  hierarchical  data  model  is  similar  to  the  network  model 
except  that  each  child  record  is  allowed  only  one  parent. 
Although  this  reduces  the  complexity  of  the  DB  design,  it  also 
makes  the  DB  more  inflexible  and  more  difficult  to  change 

The  relative  advantages  and  disadvantages  among  the  three 
available  data  models  do  not  immediately  suggest  one  method 
over  the  others  in  the  Geonames  Data  Base  application.  The 
conceptual  (and  practical)  case  of  the  relational  data  model  is 
very  appealing  with  respect  to  user  access  to  the  base,  but  the 
observed  burden  on  system  performance  is  problematic  for  the 
Geonames  Data  Base  Network  systems  resolve  performance  prob¬ 
lems  but  are  more  complex  both  conceptually  and  in  implemen¬ 
tation.  Some  commercial  DBMS  vendors  have  resolved  this  con¬ 
flict  over  ease-of-use  versus  performance  by  either  applying  per¬ 
formance  enhancements  to  relational  structures  (eg..  INGRES) 
or  imposing  relational  data  models  on  essentially  network  DBMS 
(eg.,  IDMS/R). 

Some  of  the  techniques  used  to  enhance  performance  of  rela¬ 
tional  bases  include 

*  execute  DB  functions  in  hardware 

*  B  tree  storage  techniques, 

*  hashing  storage  techniques, 

*  use  of  inverted  files 

*  optimizing  queries. 

*  use  of  virtual  space 

Modifying  a  network  data  base  to  allow  for  the  use  of  the 
relational  data  model  is  a  complex  and  product-dependent  issue 
Primarily,  it  involves  allowing  the  user  to  define  the  data  rela¬ 
tionships  and  have  the  system  determine  hew  to  implement  the 
relationships. 

Another  option  to  achieving  both  high  performance  and  ease 
of  use  is  to  design  parallel  data  bases  that  use  two  different 
DBMSs.  A  disadvantage  to  this  approach  is  that  the  DB  ad- 
mmstrasor  must  maintain  updates  to  both  data  bases  IBM  offers 


a  DBMS  using  the  hierarchical  data  model,  IMS,  and  a  DBMS 
using  the  relational  data  model,  SQL.  IBM  also  supports  soft¬ 
ware  to  transfer  data  between  each  type  of  base.  This  allows 
for  the  use  of  a  high-speed  DB  for  production  runs  using  the 
hierarchical  model  and  a  very  flexible  relational  DB  for  all  other 
needs. 


Data  base  applications  with  the  size  requirements  of  the 
Geonames  Data  Base  have  been  operationally  successful  since 
1968  when  POLAR,  the  Production  Order  Location  and  Report¬ 
ing  System,  was  implemented  for  the  Apollo  program  using  IMS. 
By  1969  this  system  consisted  of  50  data  bases  spread  over  32 
disk  packs  and  supported  130  terminals  (Grafton,  1983).  Since 
then  Very  Large  Data  Bases  (VLDB)  have  been  implemented 
to  support  many  large-scale  projects,  and  the  technology  to  sup¬ 
port  the  VLDB  has  been  expanded  to  include  distributed  data 
bases  (Rothnic  and  Goodman,  1977;  Polk  and  Byrd,  1981)  and 
data  base  machines  (Hsiao,  1979).  These  advances,  furthermore, 
are  being  translated  into  commercial  products  and  novel  applica¬ 
tions,  as  witnessed  by  recent  implementation  by  Products  Diver¬ 
sified  of  Houston  of  an  on-line  real  estate  DBMS  of  15  x  106 
records  using  Britton  Lee's  IDM  data  base  machine  with  Alpha 
Micro  front  end  processors  (News  item.  Datamation,  March 
1984). 

As  shown  in  Section  3  of  this  report  several,  though  not  all, 
of  the  commercial  DBMSs  that  operate  on  hardware/OS  that 
support  large  on-line  storage  capacity  and  multiple  users  have 
chosen  also  to  support  VLDB  These  systems  make  up  the  set 
of  feasible  solutions  to  the  Geonames  Data  Base 

Constraints  imposed  by  the  Geonames 
Data  Base  typical  of  other  DB 

The  constraints  imposed  on  the  Geonames  DBMS  by  users, 
administrators,  and  designers  are  common  to  most  other  similar 
systems.  Every  DBMS  must  provide  similar  capabilities  to  its 
users. 


The  commercial  market  for  DBMS  has  been  efficient  in  turn¬ 
ing  new  academic  ideas  into  improved  products.  The  successful 
vendors  in  this  market  are  primarily  small,  young  companies 
that  concentrate  their  efforts  into  a  limited  range  of  software 
products.  These  vendors  tend  to  be  responsive  to  market  pressures 
for  new  capabilities  (McClellan,  1984).  Data  base  products  in¬ 
troduced  using  network  data  models,  e.g..  IDMS,  have  been 


Commercial  data  base 
management  systems 

DBMS  for  large  DB/multiusers 


Commercial  market  for  DBMS 


modified  to  allow  relational  data  models  while  still  supporting 
network  and  hierarchical  structures.  Mixed  data  models  are  not 
common.  Every  product  has  implemented  one  or  more 
theoretical  approaches  to  increasing  performance  in  terms  of 
speed.  This  strong  market  benefits  all  potential  users  and  the 
Geonames  Data  Base  application  in  particular. 

3.  Selecting  candidate 
DBMS 

Approach 

We  now  must  assemble  a  list  of  products  from  this  marketplace 
that  match  the  Geonames  Data  Base  application.  To  do  so  we 
need  both  a  list  of  products  and  their  technical  descriptions  as 
well  as  a  set  of  discriminators  to  disqualify  the  products  which 
are  not  likely  to  serve  the  task. 

The  list  of  products  is  easily  made  from  proprietary  software 
reviews,  eg.,  Datapro,  Auerbach,  Data  Sources,  etc.,  and  trade 
journals.  Technical  data  from  these  product  reviews  were  sup¬ 
plemented  by  our  own  phone  survey  of  vendors  (Appendix  A). 

Since  no  specific  design  for  Geonames  Data  Base  is  available, 
we  look  for  discriminators  to  disqualify  commercial  DBMS  from 
contention  for  the  Geonames  DBMS.  Taken  together,  size  and 
number  of  users  provide  the  best  discriminator.  Although  there 
are  hundreds  of  DBMSs  on  the  market,  only  a  handful  operate 

•  on  hardware  powerful  enough  to  support  1010  bytes  of  on¬ 
line,  direct  access  storage; 

•  under  operating  systems  capable  of  reaching  extended 
addresses: 

•  on  systems  (hardware  and  OS)  capable  of  simultaneously 
serving  25  users  regardless  of  their  abilities  to  meet  other 
DBMS  goals. 

Much  of  the  market  growth  of  the  past  few  years  has  sup¬ 
ported  single-user  operating  systems  (CP/M,  MS/DOS.  etc.)  ac¬ 
cessing  on-line  storage  less  than  10  x  106  bytes.  Likewise 
systems  targeted  to  small-  and  medium-sized  businesses,  scien¬ 
tific  or  numerical  data  bases,  or  specialized  applications  can  be 
dismissed.  Systems  implemented  exclusively  on  obsolete  hard¬ 
ware  (eg.,  DEC  System  10).  systems  implemented  on  older 
minicomputers  with  limited  interfaces,  or  inadequate  operating 
systems  (eg.,  HP-3000)  can  also  be  dismissed  without  being 
troubled  by  the  details  of  their  implementation  or  the  quality 
of  user  satisfaction;  they  simply  will  not  do  the  job. 

First  level  of  discrimination: 
Eliminate  DBMS  whose  host 
systems  do  not  support 
Geonames  Data  Base  requirements 

The  first  level  discriminators  were  applied  to  the  set  of  all 
commercial  DBMS  products  identified  in  the  literature  search. 


When  in  doubt  at  this  level  of  discrimination,  the  DBMS  was 
passed  into  the  list.  Table  1  shows  a  list  of  43  systems  passed 
into  a  closer  analysis  as  initial  candidates  for  Geonames  DBMS. 
These  are  systems  that  run  on  host  systems  thought  capable 
of  supporting  Geonames  Data  Base  When  these  vendors  were 
contacted  by  Planning  Systems,  Inc.  (PSI),  11  disqualified  their 
products  for  reasons  listed  in  Table  2. 


Table  1.  Systems  passing  the  first  level  of  discrimination. 


Product  Name 

Company  Name 

ADABAS 

Software  AG 

AIM/RDB 

Fujitsu 

ARC 

Data  Point 

BASIS-DM 

Battele  Columbus  Labs 

CLIO 

United  System  Software  and  Services 

DATA  COM/DB 

Applied  Data  Research 

DBMS 

ISE 

DB  Machine 

Maga/Net 

DM  IV 

Honeywell 

DMS  II 

Burroughs  Corporation 

ENCOMPAS 

Tandem 

EXPRESS 

Management  Decisions 

FOCUS 

Information  Builders 

FULL-RDM 

International  Tech 

GYPSY 

University  of  Oklahoma 

IDM 

Britton  Lee 

IDMS 

Cullinet 

IMAGE 

HP 

INFO 

HENCO 

INGRES 

Relational  Technology 

INQUIRE 

Infodata 

ITX 

NCR 

MAXXIMUM 

California  Software  Products,  Inc 

MODEL  204  DBMS 

Computer  Corporation  of  America 

ORACLE 

Relational  Software 

PAC  III 

AGS  Management  Systems,  Inc 

PLUSI4 

Century  Analysis.  Inc 

RAMIS  ll/RELATE 

MPG 

RAPPORT 

Logics 

RD4 

Hitachi 

RELIANCE  PLUS 

Perkin  Elmer 

RIM 

Boeing  Computer  Services 

SEED 

Seed  Software 

SIR/DBMS 

Scientific  Information  Retrieval 

SQL,  DB2,  DLI,  IMS 

IBM 

SQL/UNIVERSE 

INCO 

SUPERSETUP 

The  Automated  Quin.  Inc 

SYSTEM  200C 

INTEL 

TOTAL 

CINCOM 

VAX  II  DBMS 

DEC 

Table  2.  Products  excluded  on  vendors' 
recommendations. 


Product  Name 

Reason  for  exclusion 

AIM/RDB 

Not  currently  for  sale 

BASIS-DM 

Not  currently  for  sale 

EXPRESS 

Not  full  DBMS 

FOCUS 

Base  too  large  for  product 

FULL-RDM 

Base  too  large  for  product 

GYPSY 

Not  full  DBMS 

MAXXIMUM 

Base  too  large  for  product 

PAC  III 

Not  full  DBMS 

PLUS/4 

Base  too  large  for  product 

RD4 

Not  currently  for  sale 

SUPERSETUP 

Base  too  large  for  product 

Second  level  of  discrimination: 
Eliminate  DBMS  that  do  not  support 
Geonames  Data  Base  requirements 

The  second  level  of  discriminators  were  derived  from  specific 
constraints  already  imposed  on  the  Geonames  DBMS  by  the 
functional  design  specifications.  These  constraints  include  the 
following: 

•  Size, 

— 1010  bytes 
—60  x  106  records 
—multiple  on-line  disk  packs 

•  multiple  Data  Base  (150). 

•  automatic  backups 

•  security, 

•  relational  data  model  with  performance  enhancement. 

This  last  criterion  is  PSI's  reconciliation  of  the  conflicting 

requirements  for  ease  of  use  and  high  speed. 

Table  3  establishes  the  relationship  of  the  remaining  30  systems 
with  the  criteria  established  in  the  second  level  of  discrimina¬ 
tion.  A  review  of  Table  3  shows  that  nine  DBMS  products  should 
be  considered  as  candidates  for  the  Geonames  DBMS.  These 
systems  are  listed. 


Product  Name  Company  Name 


ADABAS 

CLIO 

IDM 

IDM&R 

IMS/DL1 

INGRES 

Modal  204  DBMS 

SEED 

TOTAL 


Software  A.G  of  North  America 
Untied  Software  Syttema  and  Services 
Britton  Lee 
CuMnet  ' 

BM 

Relational  Technology 
Computer  Corp.  of  America 
Seed  Software 
Cincom 


Table  3.  Results  of  the  second  level  of  discrimination. 
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4.  Performance  measures 

Objective  performance  measures 

Having  established  a  list  of  nine  candidate  systems  we  would 
like  to  establish  a  set  of  objective  measures  of  performance  to 
judge  the  best  for  the  Geonames  DBMS.  Unfortunately,  no  such 
objective  measure  can  be  determined,  lo  do  so  would  require 
the  following: 

*  The  hardware/OS  must  be  determined. 

*  The  data  base  design  must  be  completed. 

*  A  realization  of  the  DB  and  the  DBMS  must  at  least  be 
modeled  and  perhaps  implemented  for  benchmark. 

Other  objective  experiences  in  benchmarks  for  other  system  selec¬ 
tions  should  be  consulted  when  available,  but  caution  must  be 
exercised.  All  such  benchmarks  are  conducted  against  some 
specific  data  base  realization.  How  these  data  bases  are  con¬ 
structed  are  usually  beyond  the  scrutiny  of  secondary  users  of 
the  benchmark,  and  so  the  results  cannot  be  judged  objectively. 

Objective  measures  of  the  Geonames  DBMS  will  be  made 
only  when  the  data  base  design  is  completed. 

Subjective  performance  measures 

Without  an  objective  measure  of  performance  our  final  level 
of  discrimination  must  be  subjective  We  choose  to  took  at  the 
candidate  systems  and  the  companies  offering  them  through  the 
collective  eyes  of  their  customers.  Two  recent  surveys  of  DBMS 
users,  one  by  Data  Decisions  for  Datamation  and  the  other  by 
Datapro,  show  similar  results:  Cullinet  and  its  IDMS  product, 
and  Software  AG  and  its  ADABAS  product,  are  regarded  highly 
by  users.  This  result  is  reflected  in  the  market  share  captured 
by  these  two  companies  over  the  past  seven  years  (McClellan, 
1984). 

In  December  1983,  Datamation  published  the  results  of  a 
software  survey  conducted  by  Data  Decisions  in  July  1983. 
Shown  are  the  customer  satisfaction  scores  for  six  of  the  prod¬ 
ucts  that  passed  the  previous  screening.  The  scores  shown  are 
relative  to  a  maximum  of  10. 


IDMS 

7.7 

IMS 

6.5 

ADABAS 

7.7 

SQL 

6.1 

TOTAL 

7.1 

DL1 

5.7 

Only  four  DBMS  products  received  higher  scores. 

SAS  9.0  by  SAS 

SYSTEM  1022  8.0  by  Software  House 

DMSII  7.9  by  Burroughs 

IMAGE  7.8  by  Hewlett  Packard 

One  DBMS  received  a  score  equal  to  IDMS  and  ADABAS: 
FOCUS  7.7  by  Information  Builders 
SAS  by  SAS,  Inc.,  according  to  the  technical  description  in 
Datapro,  is  not  a  complete  DBMS  but  is  a  report  writer  with 
IMS  doing  the  DBMS  functions.  SYSTEM  1022  by  Software 


House  runs  only  on  the  DEC  System  10  computer  for  which 
DEC  is  reducing  support.  DMSII  by  Burroughs  did  not  pass 
the  previous  screening  discussed  in  this  report  because  it  is  not 
a  relational  DBMS.  IMAGE  by  Hewlett  Packard  runs  on  the 
HP-3000,  which  can  support  only  24  users.  FOCUS  by  Infor¬ 
mation  Builders  was  disqualified  by  the  vendor  because  FOCUS 
is  not  designed  for  a  data  base  of  the  expected  size  of  the 
Geonames  Data  Base 

The  vendors  surveyed  by  Data  Decisions  were  selected  from 
a  list  supplied  by  International  Computer  Programs.  This  list 
contained  software  vendors  with  gross  sales  of  more  than  S3 
million  and  hardware  manufacturers  with  at  least  30  user  sites. 
These  vendors  supplied  Data  Decisions  with  lists  of  the  most 
recent  125  customers  using  their  product.  Vendors  were  requested 
not  to  contact  the  users  they  supplied,  and  Data  Decisions  verified 
this.  Questionnaires  were  sent  to  the  key  vendor  contacts,  usually 
the  data  processing  managers.  If  necessary,  telephone  interviews 
were  conducted  to  insure  there  were  at  least  15  response/packages. 
Ratings  were  based  on  the  following  scheme: 

9-10  Superior  3-5  Acceptable 

6-8  Very  Good  1-2  Inadequate 

Tables  4  and  5  show  further  details  reported  by  Data  Deci¬ 
sions  in  Datamation.  Overall  satisfaction  of  product  includes 
such  factors  as  package  features,  capabilities,  and  utility  and  fre¬ 
quency  of  failure  requiring  extra  effort  for  recovery.  Overall 
satisfaction  of  support  reflects  the  user's  appraisal  of  installa¬ 
tion,  documentation,  modification,  and  training.  Performance 
economy/efficiency  includes  such  factors  as  hardware  resource 
utilization,  ease  of  use,  freedom  horn  bugs  and  errors,  and  time 
required  for  initial  installation.  Vendor  support  gauges  the  ven¬ 
dor’s  responsiveness  to  user  needs,  effectiveness  of  training,  and 
quality  of  documentation.  Operation  is  a  measure  of  the  package's 
ability  as  handle  expanding  processing  volumes,  backup/recovery, 
and  security. 

Datapro  research  conducted  a  software  survey  during  1982 
with  the  cooperation  of  Computerworld  and  the  assistance  of 
McGraw-Hill  Research.  Listed  below  are  the  four  products  that 
have  passed  the  previous  screenings  and  were  included  in  the 
Datapro  survey.  The  scores  shown  are  the  average  and  overall 
satisfaction  ratings  on  the  scale  of  4  used  by  Datapro,  and 
translated  to  a  scale  of  10  for  comparison  to  Data  Decision  scores. 

IDMS  3.4  out  of  4.0  8.5  out  of  10.0 

ADABAS  3.1  out  of  4.0  7.8  out  of  10.0 

IMS  3.0  out  of  4.0  7.5  out  of  10.0 

TOTAL  2.1  out  of  4.0  5.3  out  of  10.0 

Table  6  gives  the  details  of  the  results  of  the  survey  for  these 
four  products.  Table  7  gives  a  summary  of  the  users'  opinions 
concerning  the  product/vendor  advantages  or  disadvantages  The 
scores  shown  in  Table  7  are  based  on  the  following: 

4— Excellent  2— Fair 

3— Good  1  —Poor 


The  Datapro  survey  was  conducted  by  McGraw-Hill  Research 
using  a  questionnaire  designed  by  Datapro.  The  questionnaire 
was  mailed  to  users  in  May  1982,  with  a  second  mailing  to 
nonrespondents  in  June  1982.  Telephone  calls  were  made  in 
July  and  August  to  those  users  who  did  not  respond  to  the 
mail  survey  in  order  to  achieve  at  least  a  50%  response  rate. 
The  users  were  selected  from  a  list  of  subscribers  to  Computer 
world  with  the  following  job  titles  and  functions: 

•  Director,  Manager,  Supervisor  of  Data  Processing  Services; 

•  Systems  Manager  and  Systems  Analyst; 

•  Manager  or  Supervisor  of  Programming. 

User  surveys  were  not  available  for  objective  determination 
of  the  following  four  systems: 

•  INGRES, 

•  CLIO, 

•  SEED, 

•  Model  204. 

Subjective  criteria  were  used  in  evaluating  these  systems. 

Because  of  the  lack  of  user  information  concerning  Model 
204  by  Computer  Corporation  of  America  and  CLIO  by  United 
Systems  Software  and  Services,  these  two  products  are  not 
recommended.  Datapro  reports  the  first  installation  of  CLIO 
in  1982.  CLIO  is  not  recommended  because  of  the  lack  of 
demonstrated  use  of  CLIO  on  very  large  data  bases.  Although 


Model  204  was  first  installed  in  1969.  there  are  only  125  in¬ 
stallations  reported  in  Datapro.  Because  of  the  lack  of  user 
reports  on  Model  204  it  is  considered  a  technical  risk  to  recom¬ 
mend  this  system. 

INGRES  by  Relational  Technology  is  currently  being  used 
by  NORDA  on  their  DEC  VAX  computer.  The  users  of  this 
DBMS  at  NORDA  claim  that  for  very  small  data  bases  (less 
than  1  Mbyte)  INGRES  took  what  they  considered  an  excessively 
long  time  to  respond  to  requests. 

MITRE  Corporation  recently  tested  SEED  by  Seed  Software 
on  the  DEC  VAX  computer.  Two  PSI  employees  observed  the 
testing  and  were  included  in  the  analysis  of  the  results.  MITRE 
found  that  because  of  the  journaling  of  the  entries  SEED  was 
very  slow.  The  journaling  may  be  disabled  but  there  would  be 
no  lock-out  protection  against  concurrent  users  reading  and 
writing  to  the  same  record.  This  test  was  also  done  on  a  very 
small  data  base 

In  reviewing  the  surveys  of  both  Datapro  and  Datamation 
two  products  constantly  were  rated  high  by  their  users:  IDMS 
and  ADABAS.  While  other  DBMS  products  were  also  well- 
regarded  by  their  users,  these  two  products  stood  out.  Because 
of  their  high  user  rating  these  two  products  passed  the  final 
level  of  discrimination. 


Table  4.  User  opinion  scores  of  products  included  in  Data  Decision  survey.  Maximum  score— 10.0. 


Number  of 

overall 

satisfaction 

vendor 

responses 

product 

support 

performance 

support 

operational 

A*  packages 

1069 

7.1 

63 

6.7 

6.2 

64 

AC 

7.7 

6.3 

7.6 

6.2 

7.2 

DL1 

wv 

48 

5.7 

5.7 

5  4 

5.4 

5.7 

IDMS 

61 

7.7 

60 

6.9 

7.3 

7.2 

IMS 

35 

6.5 

6  2 

6.1 

6.1 

6.5 

TOTAL 

53 

7.1 

60 

7.4 

50 

6.5 

SQL 

37 

6.1 

5  7 

5.7 

5.4 

5.9 

Table  5.  Product  scores  taken  from  Data  Decision  survey. 


Number 

H  rated 

*  rated 

K  users 

%  not 

Of 

DBMS 

vendor 

seeking 

satisfied 

responses 

outstanding 

outstanding 

to  replace 

w/package 

Al  packages 

1089 

77 

64 

17 

3 

ADABAS 

65 

91 

63 

2 

0 

DL1 

46 

56 

58 

27 

10 

IDMS 

61 

90 

79 

2 

0 

IMS 

35 

69 

54 

11 

0 

TOTAL 

53 

79 

58 

38 

4 

SQL 

37 

64 

80 

18 

8 

r 


Table  6.  Product  scores  taken  from  Datapro  survey. 


IDMS 

ADABAS 

TOTAL 

D 

before  1977 

«e 

7 

7 

20 

Dale  ol  initial 
testingA/se 

1978-1979 

9 

6 

6 

7 

1980-1982 

38 

27 

10 

23 

Yes  by  vendor 

7 

3 

3 

6 

Notification 

required 

Yes  by  user 

4 

6 

3 

8 

No 

46 

2 

21 

37 

Simple 

18 

21 

14 

7 

Significant 

Flexible 

46 

29 

13 

39 

advantages 

Inexpensive 

5 

2 

1 

6 

noted  by 

Save  system  resources 

19 

9 

12 

9 

users 

Save  human  resources 

41 

23 

18 

37 

Compatible 

30 

11 

16 

33 

Inflexible 

1 

2 

5 

9 

Significant 

Costly 

18 

11 

6 

24 

disadvantages 

Complex 

13 

8 

1 

33 

noted  by 

Slow 

5 

6 

2 

11 

users 

Use  excessive  resources 

8 

6 

5 

28 

Lack  key  capabilities 

1 

2 

6 

6 

Did  package 

Yes,  immediately 

41 

20 

19 

34 

perform  as 

Yes.  eventually 

14 

10 

e 

13 

required 

Never 

1 

1 

- 

1 

5.  Recommendations 

Two  DBMS  products,  IDMS/R  by  Cullinet  and  ADAB  AS 
by  Software  AG,  passed  the  objective  levels  of  discrimination 
and  received  high  scores  in  the  subjective  levels  of  discrimina¬ 
tion.  These  two  products  are  candidates  to  be  used  with  the 
Geonames  Data  Base.  The  exact  hardware/operating  system  on 
which  the  DBMS  will  reside  may  change  the  candidate  systems 
because  of  compatibility  problems. 

The  next  step  is  to  benchmark  test  candidate  products.  Bench¬ 
marks  will  allow  actual  performance  standards  to  be  measured. 
Selected  queries  of  the  type  expected  will  be  written,  a  data 
base  generated,  and  the  systems  compared.  Benchmarking  will 
also  allow  the  products  to  be  compared  to  each  use  and  user 
friendliness. 

There  are  two  possible  sources  of  data  that  could  be  used 
in  the  benchmark.  The  U.S.  Geological  Survey  (USGS) 
Geographic  Names  Information  System  could  be  accessed  and 
several  thousand  names  along  with  their  associated  data  would 
be  used.  This  data  base  would  be  quite  small  by  comparison 


Table  7.  User  opinion  scores  of  products  included  in 
Datapro  survey.  Maximum  score— 4.0. 


Average  User  Bating 

CMS 

ADABAS 

TOTAL 

BUS 

Reliability 

36 

3.5 

3.3 

3.4 

Efficiency 

3.3 

3.2 

2.6 

3.6 

Ease  of  instalation 

3.2 

3.3 

3.1 

2.6 

Ease  of  use 

3.2 

3.4 

30 

2.6 

Troubleshooting 

3.1 

2.6 

2.6 

3.1 

Documentation 

3.0 

2.6 

2.7 

29 

User  education 

3.0 

2.8 

27 

2.8 

Vendor  maintenance 

3.1 

2.9 

2.7 

3.4 

to  the  entire  Geonames  Data  Base  but  should  provide  a  very 
similar  type  of  data  base.  Multiset  ID  data  tapes  could  also  be 
used  for  benchmarking  of  DBMS  products.  These  tapes  con 
tain  over  100,000  names  per  tape  and  represent  a  large  data 
set  for  use.  However,  these  tapes  contain  little  additional  data 
besides  the  name,  making  accessing  on  other  entries  greatly 
reduced  in  comparison  to  the  data  from  USGS.  In  either  of 


the  above  cases  the  data  base  can  be  expanded  with  fictional 
data  through  randomization  or  other  programmatic  processes. 

6.  Conclusions 

There  are  commercial  DBMSs  available  that  are  capable  of 
managing  the  Geonames  Data  Base.  ADABAS  by  Software 
AG  and  IDMS/R  by  Cullinet  are  especially  good  candidates  due 
to  a  high  level  of  use  and  of  user  satisfaction.  The  decision 
of  which  system  to  purchase  should  be  made  at  a  higher  technical 
level  after  additional  DB  details  have  been  designed. 
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Appendix  A:  Questionnaire  for  Planning  Systems,  Inc. 

phone  survey  of  DBMS  vendors 


Information  for  Geonames  Data  Base  DBMS  selection 


Company  Name: 
Company  Rep/dtle: 


Product  Name: 


Machines/OS  Product  is  used  with: 


Type  of  DBMS  (Relational/Hierarchical/Hybrid): 

Can  Product  handle: 

(a)  10  Gigabyte  size  base: 

(b)  60,000,000  records: 

Other  items  to  be  considered: 

1 .  Does  product  handle  variable  length  fields?  If  so  how? 

2 .  Does  product  support  multiuser  (25),  multibases  (150), 
multidisks?  How? 

3  •  Does  product  support  aliasing  and  name  variant? 

4 .  Does  product  support  interactive  and  batch  modes? 

5 .  Where  may  application  programs  be  used  (front,  back,  from 
and  back,  interior)?  In  what  languages? 

6 .  What  are  the  maximum  number/size  of  the  fields,  records, 
and  files? 

7 .  Does  product  support  data  dictionary? 


8 .  Does  product  support  data  security  at  record,  file,  and  field 
levels? 

9 .  Does  product  support  “read  only”  data  security? 

1 0 .  Does  product  support  data  independence? 

1 1 .  Does  product  have  an  audit  trail?  At  what  level? 

1 2 .  How  does  product  support  blank  fields? 

1  3 .  Does  product  give  MIS?  Of  what  type  and  level? 

1 4 .  What  are  the  backup/restart  facilities? 

1 5 .  Does  product  support  priorities  of  users? 

1 6 .  Does  product  support  common  working  area? 

1 7 .  Does  product  support  intermediate  messages  and  cancella¬ 
tion  procedures  for  long  running  request? 

1 8 .  What  typefcost  of  vendor  support  and/or  training  is 
available? 

1 9 .  Cost: 

a.  Purchase  (GSA  Rate) 
h  Lease  (GSA  Rate) 

2  0 .  Is  source  language  available?  (type,  cost) 

2 1 .  Are  all  DBM  system  languages  included  in  cost? 
(DDL.  DML) 
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Appendix  B:  Additional  notes  on  selected 

DBMS  products 


This  appendix  contains  comments  of  DBMS  products  that 
were  selected  because  of  the  features  the  products  have  or  will 
presently  have  These  products  are 
SQL/DB2  and  IMS/DL1  by  IBM 

RAMIS  D  by  Mathematics  Products 

IDM  by  Britton  Lee 

RAPPORT  by  Logics 

GYPSY  by  University  of  Oklahoma 

IBM  currently  has  four  DBMS  products  for  use  of  their  mini 
and  main-frame  computers.  Two  of  these  products,  DB2  and 
SQL,  use  the  relational  data  model,  and  two  of  these  products, 
IMS  and  DL1,  use  hierarchical  or  network  data  model.  The 
primary  difference  between  the  DB2  and  SQL  is  the  hard¬ 
ware/operating  system  for  whch  the  product  is  written.  The  same 
is  true  for  IMS  and  DLL  Because  the  hardware/operating  system 
difference  was  not  part  of  the  select  ion /evaluation  criteria,  the 
two  relational  products  were  evaluated  as  one  product  and  the 
two  nonrelational  products  were  evaluated  as  one  product. 

Both  RAMIS  II  from  Mathematical  Products  Group  and 
RAPPORT  from  Logics  are  to  be  upgraded  in  the  future.  The 
representatives  of  these  companies  indicated  the  upgrades  would 
be  such  that  the  products  should  be  able  to  overcome  the  cur¬ 
rent  limiting  factor  of  the  products. 

The  Britton  Lee  IDM-500  is  a  data  base  machine  rather  than 
a  software  DBMS,  achieving  good  performance  through  their 
hardware  implementation  of  relational  data  model.  Hie  IDM 
product  is  limited  to  a  maximum  of  30  data  bases.  Although 
this  eliminates  the  IDM  from  further  consideration  based  on 
the  proposed  data  partitioning  into  130  bases  corresponding  to 
gazetteers,  it  should  be  noted  that  the  IDM  is  successfully  be¬ 
ing  used  with  a  data  base  of  13  million  names.  The  users  of 
that  data  base  report  excellent  response  time.  If  alternate  data 
partitioning  designs  are  considered  the  IDM  should  be  recon¬ 
sidered  as  a  possible  candidate  system  for  the  Geonames  Data 
Base 

The  DBMS  GYPSY  from  the  University  of  Oklahoma  is  used 
by  the  USGS  with  their  names  data  base  This  product  is  not 
a  DBMS  but  a  file  management  system  and  is  used  by  the  USGS 
as  such.  However,  it  does  have  a  proven  history  of  being  used 
on  a  large  data  base  of  very  similar  nature 

IDMS* 

Company:  CuQinet  Software  400  Blue  Hill  Drive 

Wtstwood,  Maine  02090.  Telephone 
(617)  329-7700. 


Functions: 

Data  base  management  system. 

Hardware  Systems: 

IBM:  System/370,  3000,  4300. 

Minimum  Memory 
Requirements: 

500K. 

Operating  System: 

OS,  DOS,  VS(E)  counterparts,  VM. 

Source  Language: 

Assembler. 

Pricing: 

Contact  vendor. 

Options: 

Central  Version,  CMS,  Distributed  Data¬ 
base  System,  DMS  interface,  Escape/DL/I, 
Escape/DBOMR  Escape/Total. 

Maintenance: 

First  year  free;  10  percent  of  license  fee 
annually  thereafter. 

Documentation: 

Included  in  price 

Training: 

Included  in  price  plus  expenses. 

Number  of 

Current  Users: 

800. 

Date  of  First 
Installation: 

May  1973. 

IDMS/R  is  the  current  release  of  CuUinet  s  DBMS.  It  con¬ 
tains  within  it  the  previously  released  DBMS,  IDMS. 

IDMS  (Integrated  Database  Management  System)  is  a  data 
base  management  system  designed  to  conform  with  the 
CODASYL  Data  Base  Task  Group  Language  specifications.  It 
includes  a  schema  data  description  language  (schema  DDL),  a 
subschema  data  description  language  (subschema  DDL),  a 
device/media  control  language  (DMCL),  and  a  data  manipula¬ 
tion  language  (DML),  as  well  as  the  data  base  management 
(DBMS)  modules  themselves.  IDMS  also  includes  a  data  dic¬ 
tionary  system,  which  operates  from  the  user-established  schema 
definition  of  data  and  a  series  of  data  administrator  utility  pro¬ 
grams.  The  data  description  language  (DDL)  is  stand-alone  and 
its  records  descriptions  are  comparable  to  those  of  Cobol.  The 
schema  DDL  input  provided  by  the  data  manager  completely 


Taken  from  Data  pro  Directory  of  Software. 
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defines  the  data  base  The  data  base  description  is  composed 
of  areas,  files,  records,  and  logical  relationships.  The  schema 
compiler  validates  and  stores  the  schema  DDL  information  in 
the  data  dictionary.  The  subschema  compiler  generates  a  series 
of  tables  that  are  maintained  in  a  catalogued  file  for  later  inter¬ 
pretation  by  the  data  management  routine  The  subschema  is 
that  portion  of  the  data  base  known  to  a  particular  applications 
program. 

Special  features  include:  path  calls,  multiple  dictionary  sup¬ 
port,  high  speed  terminal  response  time;  reentrance  record  jour¬ 
naling,  concurrent  update  prevention,  and  automatic  recovery 
with  warm  start.  This  last  feature  allows  all  unaffected  programs 
to  continue  normal  processing  when  an  individual  program  fails. 
The  teleprocessing  monitor,  IDMS-DC,  provides  a  data  com¬ 
munications  capability  integrated  with  IDMS.  IDMS-DC  is  a 
dictionary  driven  and  designed  specifically  for  use  in  the  DBMS 
environment.  The  Distributed  Database  system  is  an  option  that 
allows  application  programs  running  on  multiple  CPUs  to  ac¬ 
cess  and  update  a  common  data  base  Data  integrity  is  guaranteed, 
and  there  is  full  recovery  capability  at  each  machine  for  ap¬ 
plication  program  failure  and  for  machine  failure.  The  DMS 
Interface  Escape/DL/1,  and  Escape/DBOMP  are  a  set  of  inter¬ 
faces  that  allow  DMS,  DL/1,  and  DBOMP  users  to  access  and 
update  an  IDMS  data  base  AH  of  the  flexibility,  integrity,  and 
security  features  of  IDMS  are  available  to  these  users. 


Company: 


Functions: 


Hardware  Systems: 


Minimum  Memory 
Requirements: 

Operating  System: 


Tune-Sharing 

Service: 


ADABAS* 

Software  AG  of  North  America,  Inc., 
Res  ton  International  Center,  11800 
Sunrise  Valley  Drive,  Suite  917,  Reston, 
Virginia  22091.  Telephone  (703) 
860-5050. 

Data  base  management  system. 

IBM:  System/370,  303X,  308X.  4300; 
Siemens:  4004. 


IBM:  OS.  DOS,  VS,  DOS/VS(E),  MVS, 
VM/370-CM5;  Siemens:  PPS,  BS1000. 

Software  AG,  PRC  Computer  Center, 
Inc. 


No.  of  Programs 
in  Package: 

Source  language: 

Source  Listings: 

Pricing: 


Maintenance: 


Documentation: 


Training: 


Number  of 
Current  Users: 

Date  of  First 
Installation: 


Are  not  available 

Purchase— $162,000  (MVS),  $132,000 
(OS),  $99,000  (DOS). 

Provided  for  first  year;  10  percent  of  the 
then  current  price,  thereafter. 

Provided  with  purchase  or  lease 

Included  with  purchase;  available  with 
lease 


July  1972. 


ADABAS  (Adaptable  Data  Base  System)  is  a  dan  base  manage¬ 
ment  system  with  a  number  of  utility  programs  used  under  DOS 
or  OS  with  BDMA  for  data  base  generation  and  access.  The 
system  uses  a  variety  of  high -efficiency  data  management  techni¬ 
ques  and  provides  a  generalized  fifocoupling  capability.  The 
ADABAS  nucleus  supports  concurrent  batch  and  on-line  process¬ 
ing.  Included  with  ADABAS  is  ADASCRIPT,  an  on-line  query 
language  with  English-like  syntax.  Interfaces  are  provided  for 
popular  TP  monitors  such  as  COM  PLETE,  QCS,  150,  and 
Taskmaster.  ADACOM,  a  report  generator,  is  also  included.  A 
data  compression  algorithm  to  load  into  the  data  base  is  an  in¬ 
tegral  function  of  the  system.  Also  featured  is  die  separation 
of  physical  data  storage  from  the  representation  of  logical  rela¬ 
tionships  in  the  data  base.  ADAMINT  is  used  to  generate  high- 
level  interface  routines  for  applications  programs.  ADABAS  also 
includes  an  integrated  Data  Dictionary  system  and  full 
restart/recovery  capabilities. 

ADABAS/VM  is  the  version  of  the  product  that  operates 
under  the  CMS  component  of  VM/370.  ADARAS/VTAM  is 
the  DDP  product  that  allows  applications  running  in  one  pro¬ 
cessor  to  access  data  in  one  or  more  secondary  processors  con¬ 
nected  via  a  channel-to-channel  or  VTAM  network  line 


*  Taken  from  Datapro  Directory  of  Software 
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IDM 


IDMS 


AD ABAS 


MODEL  204  * 


TOTAL 


CLIO 


SEED 


INGRES  * 


IMS/DLI 


Methods  of  speed  enhancement  claimed  by  vendors, 
‘taken  from  direct  quotes 


L 


I 


Distribution  List 


Department  of  the  Navy 
Asst  Secretary  of  the  Navy 
(Research  Engineering  &:  System) 
Washington  DC  20350 

Department  of  the  Navy 
Chief  of  Naval  Operations 
ATTN:  OP-951 
Washington  DC  20350 

Department  of  the  Navy 
Chief  of  Naval  Operations 
ATTN:  OP-952 
Washington  DC  20350 

Department  of  the  Navy 
Chief  of  Naval  Operations 
ATTN:  OP987 
Washington  DC  20350 

Department  of  the  Navy- 
Chief  of  Naval  Material 
Washington  DC  20360 

Commander 

Naval  Air  Development  Center 
Warminster  PA  18974 

Commander 

Naval  Air  Systems  Command 
Headquarters 
Washington  DC  20361 

Commanding  Officer 
Naval  Coastal  Systems  Center 
Panama  City  FL  32407 

Commander 

Naval  Electronic  Systems  Com 

Headquarters 

Washington  DC  20360 

Commanding  Officer 
Naval  Environmental  Prediction 
Research  Facility 
Monterey  CA  93940 

Commander 

Naval  Facilities  Eng  Command 
Headquarters 
200  Stovall  Street 
Alexandria  VA  22332 


Commanding  Officer 
Naval  Ocean  R  &  D  Activity 
ATTN:  Codes  100/111/112 
NSTL  MS  39529 

Commanding  Officer 
Naval  Ocean  R  &  D  Activity 
ATTN:  Code  113 
NSTL  MS  39529 

Commanding  Officer 
Naval  Ocean  R  &  D  Activity 
ATTN:  Code  125L 
NSTL  MS  39529 

Commanding  Officer 
Naval  Ocean  R  &:  D  Activity 
ATTN:  Code  125ED 
NSTL  MS  39529 

Commanding  Officer 
Naval  Ocean  R  &;  D  Activity 
ATTN:  Code  110 
NSTL  MS  39529 

Commanding  Officer 
Naval  Ocean  R  &:  D  Activity 
ATTN:  Code  105 
NSTL  MS  39529 

Commanding  Officer 
Naval  Ocean  R  &c  D  Activity 
ATTN:  Code  115 
NSTL  MS  39529 

Commanding  Officer 
Naval  Ocean  R  &  D  Activity 
ATTN:  Code  200 
NSTL  MS  39529 

Commanding  Officer 
Naval  Ocean  R  &  D  Activity 
ATTN:  Code  300 
NSTL  MS  39529 

Commanding  Officer 
Naval  Research  Laboratory 
Washington  DC  20375 

Commander 

Naval  Oceanography  Command 
NSTL  MS  39529  ’ 
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Commanding  Officer 
Fleet  Numerical  Ocean  Cen 
Monterey  CA  93940 

Commanding  Officer 
Naval  Oceanographic  Office 
NSTL  MS  39522 

Commander 

Naval  Ocean  Systems  Center 
San  Diego  CA  92152 

Commanding  Officer 
ONR  Branch  Office  LONDON 
Box  39 

FPO  New  York  09510 

Officer  in  Charge 
Office  of  Naval  Research 
Detachment.  Pasadena 
1030  E.  Green  Street 
Pasadena  CA  91106 

Commander 

Nasal  Sea  System  Command 
Headquarters 
Washington  DC  20362 

Commander 

DWTaylor  Naval  Ship  R&D  Cen 
Bethesda  MD  20084 

Commander 

Nasal  Surface  Weapons  Center 
Dahlgren  VA  22448 

Commanding  Officer 
Nasal  Underwater  Systems  Center 
ATTN:  NEW  LONDON  LAB 
Newport  RI  02841 

Superintendent 

Naval  Postgraduate  School 

Monterey  CA  93940 

Project  Manager 
ASW  Systems  Project  (PM  -4) 
Department  of  the  Navy 
Washington  DC  20360 


Department  of  the  Navy 
Deputy  Chief  of  Naval  Material 
for  Laboratories 
Rm  866  Crystal  Plaza  Five 
Washington  DC  20360 

Officer  in  Charge 
Naval  Underwater  Sys  Cen  Det 
New  London  Laboratory 
New  London  CT  06320 

Defense  Technical  Info  Cen 
Cameron  Station 
Alexandria  VA  22314 

Director 

Chief  of  Naval  Research 
ONR  Code  420 
NSTL  MS  39529 

Director.  Liaison  Office 
Naval  Ocean  R&D  Activity 
800  N.  Quincy  Street 
Ballston  lower  #1 
Arlington  VA  22217 

Department  of  the  Navy 
Office  of  Naval  Research 
ATTN:  Code  102 
800  N.  Quincy  Street 
Arlington  VA  22217 

Director 

Woods  Hole  Oceanographic  Inst 
86-96  Water  St. 

Woods  Hole  MA  02543 

Director 

University  of  California 
Scripps  Institute  of  Oceanography 
R  O.  Box  6049 
San  Diego  CA  92106 

Working  Collection 
Texas  A  &•  M  University 
Department  of  Oceanography 
College  Station  TX  77843 


Director 

Director 

*V*V  V 

Defense  Mapping  Agency 

Defense  Mapping  Agency 

Washington.  DC  20305 

Director 

Aerospace  Cen 

St.  Louis  Air  Force  Station.  MO  63118 

j>.  . 

Defense  Mapping  Agency 

Director 

Hydrographic/Topographic  Cen 

Defense  Mapping  Agency 

*'“.•**.•**  ' 

6500  Brooke  Lane 

Special  Program  Office  of 

Washington.  DC  20315 

Exploitation  and  Modernization 

8301  Greensboro  Drive.  Suite  1100 

McLean.  VA  22102 
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